166 research outputs found

    Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users

    Get PDF
    Motivation: Much current research in biomedical text mining is concerned with serving biologists by extracting certain information from scientific text. We note that there is no ‘average biologist’ client; different users have distinct needs. For instance, as noted in past evaluation efforts (BioCreative, TREC, KDD) database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists searching for known information about a protein may seek facts, typically stated with high confidence. Text-mining systems can target specific end-users and become more effective, if the system can first identify text regions rich in the type of scientific content that is of interest to the user, retrieve documents that have many such regions, and focus on fact extraction from these regions. Here, we study the ability to characterize and classify such text automatically. We have recently introduced a multi-dimensional categorization and annotation scheme, developed to be applicable to a wide variety of biomedical documents and scientific statements, while intended to support specific biomedical retrieval and extraction tasks

    Ranking single nucleotide polymorphisms by potential deleterious effects

    Get PDF

    Automating the extraction of essential genes from literature

    Get PDF
    The construction of repositories with curated information about gene essentiality for organisms of interest in Biotechnology is a very relevant task, mainly in the design of cell factories for the enhanced production of added-value products. However, it requires retrieval and extraction of relevant information from literature, leading to high costs regarding manual curation. Text mining tools implementing methods addressing tasks as information retrieval, named entity recognition and event extraction have been developed to automate and reduce the time required to obtain relevant information from literature in many biomedical fields. However, current tools are not designed or optimized for the purpose of identifying mentions to essential genes in scientific texts.This work is co-funded by the North Portugal Regional Operational Programme, under the “Portugal 2020”, through the European Regional Development Fund (ERDF), within project SISBI- Refa NORTE-01-0247-FEDER-003381. The Centre of Biological Engineering (CEB), University of Minho, sponsored all computational hardware and software required for this work.info:eu-repo/semantics/publishedVersio

    How to Get the Most out of Your Curation Effort

    Get PDF
    Large-scale annotation efforts typically involve several experts who may disagree with each other. We propose an approach for modeling disagreements among experts that allows providing each annotation with a confidence value (i.e., the posterior probability that it is correct). Our approach allows computing certainty-level for individual annotations, given annotator-specific parameters estimated from data. We developed two probabilistic models for performing this analysis, compared these models using computer simulation, and tested each model's actual performance, based on a large data set generated by human annotators specifically for this study. We show that even in the worst-case scenario, when all annotators disagree, our approach allows us to significantly increase the probability of choosing the correct annotation. Along with this publication we make publicly available a corpus of 10,000 sentences annotated according to several cardinal dimensions that we have introduced in earlier work. The 10,000 sentences were all 3-fold annotated by a group of eight experts, while a 1,000-sentence subset was further 5-fold annotated by five new experts. While the presented data represent a specialized curation task, our modeling approach is general; most data annotation studies could benefit from our methodology

    Measurement of 222Rn dissolved in water at the Sudbury Neutrino Observatory

    Full text link
    The technique used at the Sudbury Neutrino Observatory (SNO) to measure the concentration of 222Rn in water is described. Water from the SNO detector is passed through a vacuum degasser (in the light water system) or a membrane contact degasser (in the heavy water system) where dissolved gases, including radon, are liberated. The degasser is connected to a vacuum system which collects the radon on a cold trap and removes most other gases, such as water vapor and nitrogen. After roughly 0.5 tonnes of H2O or 6 tonnes of D2O have been sampled, the accumulated radon is transferred to a Lucas cell. The cell is mounted on a photomultiplier tube which detects the alpha particles from the decay of 222Rn and its daughters. The overall degassing and concentration efficiency is about 38% and the single-alpha counting efficiency is approximately 75%. The sensitivity of the radon assay system for D2O is equivalent to ~3 E(-15) g U/g water. The radon concentration in both the H2O and D2O is sufficiently low that the rate of background events from U-chain elements is a small fraction of the interaction rate of solar neutrinos by the neutral current reaction.Comment: 14 pages, 6 figures; v2 has very minor change

    Figure Text Extraction in Biomedical Literature

    Get PDF
    Background: Figures are ubiquitous in biomedical full-text articles, and they represent important biomedical knowledge. However, the sheer volume of biomedical publications has made it necessary to develop computational approaches for accessing figures. Therefore, we are developing the Biomedical Figure Search engin

    A radium assay technique using hydrous titanium oxide adsorbent for the Sudbury Neutrino Observatory

    Full text link
    As photodisintegration of deuterons mimics the disintegration of deuterons by neutrinos, the accurate measurement of the radioactivity from thorium and uranium decay chains in the heavy water in the Sudbury Neutrino Observatory (SNO) is essential for the determination of the total solar neutrino flux. A radium assay technique of the required sensitivity is described that uses hydrous titanium oxide adsorbent on a filtration membrane together with a beta-alpha delayed coincidence counting system. For a 200 tonne assay the detection limit for 232Th is a concentration of 3 x 10^(-16) g Th/g water and for 238U of 3 x 10^(-16) g U/g water. Results of assays of both the heavy and light water carried out during the first two years of data collection of SNO are presented.Comment: 12 pages, 4 figure

    A bioinformatics knowledge discovery in text application for grid computing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources.</p> <p>Methods</p> <p>The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs.</p> <p>Results</p> <p>A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed.</p> <p>Conclusion</p> <p>In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities.</p

    Models of Temporal Enhanced Ultrasound Data for Prostate Cancer Diagnosis: The Impact of Time-Series Order

    Get PDF
    Recent studies have shown the value of Temporal Enhanced Ultrasound (TeUS) imaging for tissue characterization in transrectal ultrasound-guided prostate biopsies. Here, we present results of experiments designed to study the impact of temporal order of the data in TeUS signals. We assess the impact of variations in temporal order on the ability to automatically distinguish benign prostate-tissue from malignant tissue. We have previously used Hidden Markov Models (HMMs) to model TeUS data, as HMMs capture temporal order in time series. In the work presented here, we use HMMs to model malignant and benign tissues; the models are trained and tested on TeUS signals while introducing variation to their temporal order. We first model the signals in their original temporal order, followed by modeling the same signals under various time rearrangements. We compare the performance of these models for tissue characterization. Our results show that models trained over the original order-preserving signals perform statistically significantly better for distinguishing between malignant and benign tissues, than those trained on rearranged signals. The performance degrades as the amount of temporal-variation increases. Specifically, accuracy of tissue characterization decreases from 85% using models trained on original signals to 62% using models trained and tested on signals that are completely temporally-rearranged. These results indicate the importance of order in characterization of tissue malignancy from TeUS data

    Measurement of the rate of nu_e + d --> p + p + e^- interactions produced by 8B solar neutrinos at the Sudbury Neutrino Observatory

    Get PDF
    Solar neutrinos from the decay of 8^8B have been detected at the Sudbury Neutrino Observatory (SNO) via the charged current (CC) reaction on deuterium and by the elastic scattering (ES) of electrons. The CC reaction is sensitive exclusively to nu_e's, while the ES reaction also has a small sensitivity to nu_mu's and nu_tau's. The flux of nu_e's from ^8B decay measured by the CC reaction rate is \phi^CC(nu_e) = 1.75 +/- 0.07 (stat)+0.12/-0.11 (sys.) +/- 0.05(theor) x 10^6 /cm^2 s. Assuming no flavor transformation, the flux inferred from the ES reaction rate is \phi^ES(nu_x) = 2.39+/-0.34 (stat.)+0.16}/-0.14 (sys) x 10^6 /cm^2 s. Comparison of \phi^CC(nu_e) to the Super-Kamiokande Collaboration's precision value of \phi^ES(\nu_x) yields a 3.3 sigma difference, providing evidence that there is a non-electron flavor active neutrino component in the solar flux. The total flux of active ^8B neutrinos is thus determined to be 5.44 +/-0.99 x 10^6/cm^2 s, in close agreement with the predictions of solar models.Comment: 6 pages (LaTex), 3 figures, submitted to Phys. Rev. Letter
    corecore